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Abstract 

Power law distribution seems to be an important characteristic of web graphs. Several exist- 
ing web graph models generate power law graphs by adding new vertices and non-uniform edge 
connectivities to existing graphs. Researchers have conjectured that preferential connectivity 
and incremental growth are both required for the power law distribution. In this paper, we 
propose a different web graph model with power law distribution that does not require incre- 
mental growth. We also provide a comparison of our model with several others in their ability 
to predict web graph clustering behavior. 

1 Introduction 

The growth of the World Wide Web (WWW) has been explosive and phenomenal. Google has 
more than 2 billion pages searched as of February 2002. The Internet Archive Q has 10 billion pages 
archived as of March 2001. The existing growth-based models H|||2l| are adequate to explain the 
web's current graph structure. It would be interesting to know if a different model will be needed 
as the web's growth rate slows down || while its link structure continues to evolve. 

1.1 Why Power Laws? 

Barabasi et al. P,|l0| and Medina et al. [24 stated that preferential connectivity and incremental 



growth are both required for the power law distribution observed in the web. The importance of 
the preferential connectivity has been shown by several researchers [|, 16 1. 



Faloutsos et al. [15] observed that the internet topology exhibits power law distribution in the 
form of y = x a . When studying web characteristics, the documents can be viewed as vertices in a 
graph and the hyper-links as edges between them. Various researchers [0, [| [H], 22] have indepen- 



dently showed the power law distribution in the degree sequence of the web graphs. Huberman 
and Adamic |5,|l6] showed a power law distribution in the web site sizes. See [^] for a summary 



of works on web graph structure. 

Medina et al. [^4|] showed that topologies generated by two widely used generators, the Waxman 
model [||, and the GT-ITM tool |]|, do not have power law distribution in their degree sequences. 
Palmer and Steffan [27] proposed a power law degree generator that recursively partitions the 



adjacency matrix into an 80-20 distribution. However, it is unclear if their generator actually 
emulates other web properties. 

The power law distribution seems to be an ubiquitous property. The power law distribution 
occurs in epidemiology [30], population studies |28|], genome distribution |l7], ^9|, various social 



phenomena [11, 26], and massive graphs ||, |J. For the power law graphs in biological systems, 
the connectivity changes appear to be much more important than growth in size due to the long 
time-scale of biological evolution. 
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1.2 Properties for Graph Model Comparison 



world phenomenon [25, 31]. 
size of the web graph. For n 



Another important web graph property that has been looked at is diameter. However, there are 
conflicting results in the published papers. Albert et al. stated that web graphs have the small 

in which the diameter A is roughly 0.35 + 2.06 lgn, where n is the 
8 x 10 8 , A ~ 19. Lu [23] proved the diameters of random power 
law graphs are logarithmic function of n under the model proposed by Aiello et al. Q. However, 
Broder et al. [12] showed that, over 75% of the time, there is no directed path between two random 
vertices. If there is a path, the average distance is roughly 16 when viewing web graph as directed 
graph or 6.83 in the undirected case. 

Currently, there are few theoretical graph models ||,|,^T,27| for generating power law graphs. 
There are very few comparative studies that would allow us to determine which of these theoretical 
models are more accurate models of the web. We only know that the model proposed by Kumar 
et al. |H|] generates more bipartite cliques than other models. They believe clustering to be an 
important part of web graph structures that was insufficiently represented in previous models 



1.3 New Contributions 

In this paper, we show power law graphs do not require incremental growth, by developing a graph 
model which (empirically) results in power laws by evolving a graph according to a Markov process 
while maintaining constant size and density. 

We also describe an easily computable graph property that can be used to capture cluster in- 
formation in a graph without enumerating all possible subgraphs. We use this property to compare 
our model with others and with actual web data. 



2 Steady State Model 

Our Steady State (SS) model is very simple in comparison with other web graph models 21,27]. 
It consists of repeatedly removing and adding edges in a sparse random graph G. 

Let m be 0(n). We generate an initial sparse random graph G with m edges and n vertices, 
by randomly adding edges between vertices until we have m edges. As discussed below, the initial 
random distribution of edges is unimportant for our model. 

We then iterate the following steps r times on G, where r is a parameter to our model. 

1. Pick a vertex v at random. If there is no edge incident upon v, we repeat this step until v 
has nonzero degree. 

2. Pick an edge (u, v) G G at random. 

3. Pick a vertex x at random. 

4. Pick a vertex y with probability proportional to degree. 

5. If (x, y) is not an edge in G and x is not equal to y, then remove edge (u, v) and add edge 
(x,y)- 

One can view our model as an aperiodic Markov chain with some limiting distribution. If we 
repeat the above steps long enough, the random graphs generated by this model will be close to 
this limiting distribution, no matter what the initial random sparse graph is. Note that unlike 
other models |^, [H|] , the graphs generated by our model do not contain self- loops nor multiple 
edges between two vertices. 
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Figure 1: Initial G(500, 1500), & G After 100K and 10M Steps 



Barabasi et al. ||] also proposed a non-growth model, which failed to produce a power law 
distribution. Both models have preferential connectivity features. However, there are several 
differences between our model and theirs. First, our edge set is fixed and the initial graph is 
generated via classical random graph models 14,18]. Second, our model has "rewiring" feature 
similar to one in the small world model 13,25, 31 1. 



2.1 Simulation Results 

We simulated our model on graphs of different sizes, (500 < n < 5000), and densities (1 < 
^ < 3). We repeated each simulation 5 times, and performed r = 10000000 edge deletion/insertion 
operations on each graph. The vertices' degree distributions appear to converge to power law 
distributions as the number of edge deletion/insertion operations increases. Some of our simulation 
results are shown in Figures || - Figures || and ^ show degree distributions at various stage of 
simulations. Figures || and || show degree distributions for graphs with different densities ^. 



3 Cluster Information 

Given a subgraph S of G, ds(v) is the degree for vertex v in S. Here we examine the maximum 
degree d max in all subgraphs, which is defined as 

maxs min ve s ds(v). 

We use d^ ax to denote the value obtained under graph model M. 

To compute (i maa; for a graph G, we perform the following steps until G becomes empty: 

1. Select a minimum degree vertex v from G. 

2. Set d max to d(v) if d(v) > d max . 
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Figure 2: G(500,500), G(500, 1000), and G( 500, 1500) After 10M Steps 
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Figure 3: Initial G(3000, 9000), & G After lOOif and 10M Steps 
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Figure 4: G(3000, 3000), G(3000, 6000), and G(3000,9000) After 10M Steps 
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Figure 5: Minimal Degree Vertex Elimination 



3. Remove vertex v and its edges from G. 

The above steps correctly compute d max because we cannot remove any vertices of S until the 
degree of the current subgraph reaches d max . The minimal degree elimination sequence for graph 
in Figure § will be B, C, A, D, and E. The de grees when those vertices got eliminated are 1, 1, 2, 1, 
and 1. d max is 2 since max{l, 1, 2, 1, 1} = 2 . 

Observation 1 For any model M that constructs a graph by adding a vertex at a time, and for 
which each newly added vertex has the same degree d = d^ ax = d. 



Thus the Barabasi and Albert model (BA) || or the linear growth copying model in [^TJ have 
the same value for d max for graphs of all sizes once d = ^ is fixed. 

Observation 2 The web graph generated by the linear model has minimum vertex degree of d = ^. 

Hence, the linear model may not encapsulate all the crucial properties in a web graph if there 
are significant numbers of vertices with degree less than ^ . 

3.1 Web Crawl and Simulation Data 

We performed a web crawl on various Computer Science department web sites. We then used 
the ACL model || to generate new graphs from degree sequences in the actual web graphs. We 
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also ran the SS model using n and m values from the actual web graphs with 10000000 edge 
insertion/deletion steps. For each graph, we run both models 5 times. The following table shows 
the means // and the standard deviations a for d max values using the ACL model and the SS 
model. 



Site 


n 


m 


dmax 


HACL 


&ACL 


fJ-ss 


&SS 


arizona 


5315 


16892 


15 


10 





8 





berkeley 


2826 


22957 


45 


21.6 


0.547 


16 





caltech 


622 


4830 


7 


5.8 


0.447 


12.8 


0.447 


emu 


2052 


23821 


57 


37.2 


0.447 


20 


0.707 


Cornell 


7145 


14919 


17 


19.4 


0.547 


6 





harvard 


915 


9327 


21 


12.6 


0.894 


16.4 


0.547 


mit 


4861 


15360 


31 


24.4 


0.547 


7 





nd 


1913 


16328 


33 


29.2 


0.447 


15.4 


0.547 


Stanford 


2553 


25693 


27 


14.6 


0.547 


18.4 


0.547 


ucla 


2718 


19755 


22 


16.6 


0.547 


14.2 


0.447 


ucsb 


5236 


10338 


22 


13.8 


0.447 


5 





ucsd 


553 


3885 


15 


7.2 


0.447 


11.8 


0.447 


uiowa 


1410 


12258 


8 


8.8 


0.447 


15.2 


0.447 


uiuc 


5623 


28872 


29 


21 





11.8 


0.836 


unc 


1465 


5446 


17 


9.8 


0.447 
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Washington 


7001 


24901 


17 


12 
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Table 1: d max from Actua' 


Web Crawl and Mode 


Simu 


ation 



In general, the ACL model and the SS model are generating less clustered graphs than what we 
see on actual web graphs. This implies that we need a more detailed model of web graph clustering 
behavior. 



4 Conclusion and Open Problems 

Previously, researchers have conjectured that preferential connectivity and incremental growth are 
necessary factors in creating power law graphs. In this paper, we provide a model of graph evolution 
that produces power law without growth. Our Steady State model is very simple in comparison with 



other graph models [21]. It also does not require prior degree sequences as in the ACL model 



The difficulty in comparing various models H|,21] is that each model has different parameters 
and inputs. Here we provide a simple graph property d max that captures the clustering behavior 
of graphs without complicated subgraph enumeration algorithm. It can be useful in gauging the 
accuracy of various models. 

From our web crawl data, we know that the linear models such as Barabasi's || are not the best 
ones to use when considering d max . Both ACL and SS models are not generating dense-enough 
subgraphs when comparing against the actual web graphs. Thus, we need a better web graph model 
that mimics actual web graph clustering behavior. 

Here are some of our open problems: 

1. Can one prove theoretically that the SS method actually has a power law distribution? 

2. How long does it take for our model to reach a steady state? 
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3. What are other simple web graph properties that we can use to determine the accuracy of 
various models? 

4. Are there any technique such as graph products that we can use to generate realistic massive 
web graphs in relatively short times? 
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